Performance Modelling for Parallel PDE Solvers on NUMA-Systems
نویسنده
چکیده
A detailed model of the memory performance of a PDE solver running on a NUMA-system is set up. Due to the complexity of modern computers, such a detailed model inevitably is very complicated. Therefore, approximations are introduced that simplify the model and allows NUMA-systems and PDE solvers to be described conveniently. Using the simpli ed model, it is shown that PDE solvers using ordered local methods can be made very unsensitive to high NUMA-ratios, allowing them to scale well on virtually any NUMA-system. PDE solvers using unordered local methods, semiglobal methods or global methods are more sensitive to high NUMA-ratios and require special techniques in order to scale well beyond a single locality group. Nevertheless, the potential performance gain of improving the data distribution on a NUMA-system can be considerable for all kinds of PDE solvers studied.
منابع مشابه
Performance of PDE solvers on a self-optimizing NUMA architecture
The performance of shared-memory (OpenMP) implementations of three different PDE solver kernels representing finite difference methods, finite volume methods, and spectral methods has been investigated. The experiments have been performed on a self-optimizing NUMA system, the Sun Orange prototype, using different data placement and thread scheduling strategies. The results show that correct dat...
متن کاملImproving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality, as the non-uniformity is a consequence of the physical distance between the cc-NUMA nodes. In this article, we compare the well established method of exploiting the rst-touch strategy using parallel in...
متن کاملGeographical Locality and Dynamic Data Migration for OpenMP Implementations of Adaptive PDE Solvers
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality. In this article, we study the performance of a parallel PDE solver with adaptive mesh refinement. The solver is parallelized using OpenMP and the adaptive mesh refinement makes dynamic load balancing n...
متن کاملPerformance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communication. Performance analysis of a non-trivial kernel representing a PDE solution algorithm has been carried out on a Sun WildFire computer. Here, different architecture, system and programming models can be studied. Th...
متن کاملMapping Algorithms and Software Environment for Data Parallel PDE Iterative Solvers
We consider computations associated with data parallel iterative solvers used for the numerical solution of Partial Di erential Equations (PDEs). The mapping of such computations into load balanced tasks requiring minimum synchronization and communication is a di cult combinatorial optimization problem. Its optimal solution is essential for the e cient parallel processing of PDE computations. D...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006